Commentary Whole-genome disassembly
نویسندگان
چکیده
T race to sequence the human genome has garnered a level of popular attention unprecedented for a scientific endeavor. This fascination has partly been caused of course by the importance of the goal; but it also reflects the Olympian nature of the contest, which opposed two capable teams with sharply contrasting cultures (public and private), personalities, and strategies. Titanic struggles being the stuff of mythology, it should perhaps not surprise us that a number of myths regarding this race have already emerged. In a recent issue of PNAS, Waterston et al. (1), leaders of the public effort, help to dispel one of these myths, involving the controversial ‘‘whole-genome shotgun’’ strategy used by Celera. Issues surrounding sequencing strategies will no doubt seem arcane to most readers but are worth considering if only because they may significantly influence the pace and cost of DNA sequencing during the remainder of the Genome Era. That a strategy is needed at all arises from the fact that a sequencing ‘‘read,’’ the tract of data obtainable in a single experimental run, is only a few hundred bases in length and contains errors. Getting reliable sequence of a larger DNA segment therefore requires a method for generating and piecing together a number of reads covering the segment. Since its introduction by Sanger and colleagues over 20 years ago, the favored method for this purpose has comprised the following steps: an initial ‘‘shotgun’’ phase in which reads are derived from subclones essentially randomly located within the targeted region; an assembly phase, in which read overlaps are determined (the main challenge here being to identify and discard false overlaps arising from repeated sequences) and used to approximately reconstruct the underlying sequence; and a finishing phase in which additional reads are obtained in directed fashion to close gaps and shore up data quality where needed. The shotgun phase usually involves obtaining a substantial redundancy of read coverage of the target, typically at least 6–8-fold, to minimize the amount of work required during the labor-intensive finishing phase. For the human genome, which comprises some 3 billion base pairs, the public effort adopted a well-tested modular approach in which large fragments of the genome (roughly 150,000 bp in size) were first cloned into a bacterial host (as bacterial artificial chromosomes or BACs) and then sequenced individually by the shotgun method. Among other advantages, this ‘‘clone by clone’’ strategy simplifies the assembly problem (by reducing its scale and the likelihood of errors caused by repeats), generates substantial sequence tracts of known contiguity that can be mapped relatively efficiently back to the genome, and yields resources that are useful in the finishing stage and for independent tests of assembly accuracy. A ‘‘draft’’ version of the genome sequence (based on a somewhat lower shotgun depth coverage for most of the clones) obtained in this way was published last year (2). In contrast, Celera adopted a wholegenome shotgun approach, which purports to accelerate the above process by bypassing the intermediate step of cloning large fragments and instead derives reads directly from the whole genome. The process is clearly riskier because of the significantly greater possibility of assembly error, but had been successfully used by Celera to produce a near-complete sequence of the Drosophila genome (3, 4) with about 2,500 gaps. Its ability to cope with the human genome, which is 30-fold larger and much richer in repetitive sequences than Drosophila, remained unclear. Against all odds, Celera demonstrated that it worked (5), producing an independent human genome sequence of comparable or higher quality than that obtained by the public effort. Or did they? This is the myth that Waterston et al. (1) overturn. Far from constructing an independent sequence, Celera incorporated the public data in three important ways into their ‘‘whole genome assembly.’’ (i) The assembled BAC sequences from the public project were ‘‘shredded’’ in a manner that (as Waterston et al. show) retained nearly all of the information from the original sequence, and used as input. (ii) In a process called ‘‘external gap walking,’’ unshredded, assembled, public BAC sequences were used to close gaps. (iii) Public mapping data were used to anchor sequence islands to the genome. As a result, the assembly reported by Celera cannot be viewed as a true whole-genome shotgun assembly. Moreover, accuracy tests in ref. 5, which involved comparison of Celera’s assembly to finished portions of the public sequence, are virtually meaningless because the finished sequence was itself used in constructing the Celera assembly. We are left with no idea how a true whole-genome assembly would have performed. It is striking, however, that even with this use of the public data, what Celera calls a whole-genome assembly was a failure by any reasonable standard: 20% of the genome is either missing altogether or is in the form of 116,000 small islands of sequence (averaging 2.3 kb in size) that are unplaced, and for practical purposes unplaceable, on the genome. Several other myths beyond the one discussed by Waterston et al. have become widely accepted. One is that the whole genome shotgun approach was in large measure responsible for Celera’s rapid pace at sequencing the Drosophila and human genomes. In fact, their great speed was mainly because of the acquisition of a huge, unprecedented sequencing capacity (some 200 capillary machines, each able to produce 500-1000 reads per day) as a result of their corporate ties with a manufacturer of these machines. That this was really the key factor is evident from the fact that when the public effort acquired similar capacity, they were able to attain a comparable or higher throughput by using the clone by clone approach. A third myth is that the whole-genome approach saves money. Although definitive judgement here should await a rigorous cost accounting, the basic economics of sequencing by the clone by clone approach have apparently not changed greatly over the past 5 or 6 years. Less than 10% of the overall cost goes to BAC mapping and subclone library construction, 50–60% to the shotgun itself (assuming a coverage of 6–8 ), and the remaining 30–40% goes to finishing. Even if it works as intended, the whole-genome approach can save at best the 10% involved
منابع مشابه
Whole Cell Cryo-Electron Tomography Reveals Distinct Disassembly Intermediates of Vaccinia Virus
At each round of infection, viruses fall apart to release their genome for replication, and then reassemble into stable particles within the same host cell. For most viruses, the structural details that underlie these disassembly and assembly reactions are poorly understood. Cryo-electron tomography (cryo-ET), a unique method to investigate large and asymmetric structures at the near molecular ...
متن کاملCommentary on clinical utility of whole genome sequencing and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
The present study is commentary and aims to evaluate the practical application of Whole Genome Sequencing (WGS) and Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS) in clinical microbiology. The MALDI-TOF MS method has been replaced cultural and biochemical tests for species identification in most laboratories worldwide. Moreover, WGS has elevated the ...
متن کاملA Forward Genetic Screen and Whole Genome Sequencing Identify Deflagellation Defective Mutants in Chlamydomonas, Including Assignment of ADF1 as a TRP Channel
With rare exception, ciliated cells entering mitosis lose their cilia, thereby freeing basal bodies to serve as centrosomes in the formation of high-fidelity mitotic spindles. Cilia can be lost by shedding or disassembly, but either way, it appears that the final release may be via a coordinated severing of the nine axonemal outer doublet microtubules linking the basal body to the ciliary trans...
متن کاملSex chromosomes and sex determination in reptiles Commentary
Reptiles occupy a crucial position with respect to vertebrate phylogeny, having roamed the earth for more than 300 million years and given rise to both birds and mammals. To date, this group has been largely ignored by contemporary genomics technologies, although the green anole lizard was recently recommended for whole genome sequencing. Future experiments using flow-sorted chromosome librarie...
متن کاملAAA+: A class of chaperone-like ATPases associated with the assembly, operation, and disassembly of protein complexes.
Using a combination of computer methods for iterative database searches and multiple sequence alignment, we show that protein sequences related to the AAA family of ATPases are far more prevalent than reported previously. Among these are regulatory components of Lon and Clp proteases, proteins involved in DNA replication, recombination, and restriction (including subunits of the origin recognit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002